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Abstract —This paper investigates the problem of database- 
assisted spectrum access in dynamic TV white spectrum net¬ 
works, in which the active user set is varying. Since there is 
no central controller and information exchange, it encounters 
dynamic and incomplete information constraints. To solve this 
challenge, we formulate a state-based spectrum access game and a 
robust spectrum access game. It is proved that the two games are 
ordinal potential games with the (expected) aggregate weighted 
interference serving as the potential functions. A distributed 
learning algorithm is proposed to achieve the pure strategy Nash 
equilibrium (NE) of the games. It is shown that the best NE is 
almost the same with the optimal solution and the achievable 
throughput of the proposed learning algorithm is very close to 
the optimal one, which validates the effectiveness of the proposed 
game-theoretic solution. 

Index Terms —TV white spectrum, geo-location database, or¬ 
dinal potential game, learning automata. 


I. Introduction 

E MPLOYING TV White Spectrum (TVWS) H-0 has 
been regarded as a promising approach to solve the spec¬ 
trum shortage problem in future wireless networks, as it can 
effectively improve the spectrum efficiency by allowing the 
unlicensed users dynamically to access the idle TV channels. 
For the TVWS, it has been shown that obtaining the spectrum 
availability information by inquiring a geo-location database 
is more efficient than performing spectrum sensing alone (4], 
0. Also, geo-location database has been widely supported by 
the standards bodies and industrial organizations 0-0. 

Currently, researchers in this field mainly focused on: i) 
constructing and maintaining the geo-location database, e.g., 
m-m, and ii) developing business models to analyze the 
revenues of the TV spectrum holders and the unlicensed users, 
e.g., m-m. Since there is no centralized controller avail¬ 
able, how to choose a channel for transmission in a distributed 
manner, aiming to avoid mutual interference among the users, 
remains a key challenge. However, only a few preliminary 
results were reported recently EE). ED, and hence it is urgent 
and important to study efficient database-assisted distributed 
spectrum access strategies. 

In this paper, we consider a dynamic and distributed TVWS 
network. Specifically, considering practical applications of the 
users, they do not access the channels when there is no data to 
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transmit. To capture this dynamics, it is assumed that each user 
becomes active/inactive according to an active probability in 
each decision period. As a result, the active user set is varying. 
Furthermore, since there is no information exchange, a user 
does not know the chosen channels, the current state (active 
or inactive) and the active probabilities of other users. The 
dynamic and incomplete information constraints make the task 
of developing efficient distributed spectrum access strategies 
challenging. 

To solve this problem, we resort to game models B! 
and distributed learning technology. Specifically, we formulate 
a state-based spectrum access game and a robust spectrum 
access game, and propose a distributed learning to achieve 
desirable solutions. The main contributions can be summarized 
as follows: 

1) For an arbitrary active user set, we formulate the prob¬ 
lem of distributed spectrum access as a state-based non- 
cooperative game. It is proved that state-based game is 
an ordinal potential game with the aggregate weighted 
interference serving as the potential function. To address 
the challenges caused by the varying active user set, 
we formulate a robust spectrum access game, which is 
also proved to be a ordinal potential game. Finally, we 
propose a distributed learning algorithm to achieve the 
pure strategy Nash equilibria of the formulated games. 

2) It is shown that the best Nash equilibrium is almost 
the same with optimal solution, which validates the 
effectiveness of the formulated game models. In addi¬ 
tion, the achievable throughput of the proposed learning 
algorithm is very close to the optimal one, which also 
validates the distributed learning algorithm in dynamic 
networks. 

The most related work is m. in which game-theoretic 
database-assisted spectrum sharing strategies were investi¬ 
gated. This work is differentiated in: i) all users are assumed 
to be always active m, while we consider a network with 
varying number of active users, and iii) the spectrum ac¬ 
cess algorithms in m need information exchange, while 
the proposed learning-based spectrum access algorithm is 
fully distributed. Also, the problem of distributed spectrum 
access for minimizing the aggregate weighted interference 
was studied in m and in our previous work (2Q), GO. 
The differences in the current work are that we optimize the 
throughput directly and the active user set in each decision 
period is randomly changing. Also, some simulation results 
on the problem of distributed spectrum access with changing 
active user set were preliminarily reported in our recent work 
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Fig. 1. The illustrative diagram of database-assisted spectrum access. 


l22l . In this work, rigorous analysis and proofs are given and 
more simulation results are presented to validate the proposed 
game-theoretic solution. 

The rest of this paper is organized as follows. In Section 
II, the system model and problem formulation are presented. 
In Section III, we formulate the state-based spectrum access 
game and the robust spectrum game respectively, analyze their 
properties, and propose a distributed learning-based spectrum 
access algorithm. Finally, simulation results and discussion are 
presented in Section IV and conclusion is drawn in Section V. 

II. System Model and Problem Formulation 

A. System model 

We consider a distributed network with N cognitive users 
and M channels. Note that here each cognitive user corre¬ 
sponds to a communication link consisting of a transmitter 
and a receiver, e.g., the cognitive access point (AP) and its 
serving clients. Each cognitive user inquires the spectrum 
availability from the geo-location spectrum database, which 
specifies the available channel set A n and the maximum 
allowable transmission power P n for each user n. With the 
inquired information on the available channels and maximum 
transmission power, the users take suitable spectrum access 
strategies through learning. An illustration of the database- 
assisted spectrum access is shown in Fig. |T] 

To address the user traffic in practical applications, we 
consider a network with a varying number of active users. 
Specifically, it is assumed that each user performs channel 
access in each slot with probability X n , 0 < A„ < 1. Such a 
model captures general kinds of dynamics in wireless networks 
l23l . e.g., a user becomes active only when it has data to 
transmit and inactive otherwise; also, it can be regarded as 
a high-level abstraction of the user traffic, i.e., the active 
probability corresponds to the probability of non-empty buffer. 

B. Problem formulation 

To capture the changing number of active users, we define 
the system state as S = {si,..., sn}, where s n = 1 indicates 


that user n is active while s n = 0 means it is inactive. The 
system state probability is then given by p(si, ..., sn) = 
n;7=i Pm where p n is determined as follows: 

{ Xn ? Sn — 1 /1 \ 

1 - A„, S n = 0 ( } 

Denote an arbitrary and non-empty active user set as B, i.e., 
B = {n G Af : s n = 1} ■ For presentation, denote the set of all 
the active user sets as F. Then, the probability of an active user 
set can denoted by //(B). We have M®) = 1 — //(Bo), 

where //(Bo) = ]~[^=i (1 ~ A n ) is the probability that all the 
users are inactive. 

With the spectrum availability information obtained from 
the database, user n chooses a channel a n G A n for data 
transmission. For any active user set B and channel selection 
profile (a„,a-„), the received Signal-to-Interference-plus- 
Noise Ratio (SINR) of an active user n is determined by: 

Vn(B,a n ,a- n ) = — - Pndn , (2) 

where a is the path loss factor, d n is the distance between user 
n and its dedicated receiver, di n is the distance between user i 
and n, Y^ieB\{ny. ai =a n P i d in + a is the aggregate interference 
from all other active users also choosing channel a n , and a 
is the background noise. Then, the achievable throughput of 
user n is given by: 

Bn (B, a n , a—n') — B log (l -f rj n (B, a n ,u_ n )), (3) 

where B is the channel bandwidth. Therefore, there are two 
possible optimization goals for each user, i.e., 

PI: max R n (B, a n ,a_ n ),WB G T, (4) 

a n 

or 

P2 . max Eg [Rn {B 1 a n , ct_ n )] — ^ ' /r(B)FZ n (B, tz n , tt_ n ) 

sgr 

(5) 

Since the network is always changing, it is not feasible 
to optimize the achievable throughput for each active user 
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set. Thus, we pay attention to solving P2. However, the task 
of solving P2 is challenging due to the following imperfect 
information constraints: 

• Dynamic: the active user set in the system is always 
changing; in particular, it may change randomly in each 
decision period. 

• Incomplete: there is no information exchange among the 
users, which leads to: i) a user does not know the active 
probabilities and chosen channels of other users, and ii) 
the state distribution p,(B) is unknown to the all users. 

Based on the above analysis, it is seen that centralized ap¬ 
proaches are not available and we need to develop a distributed 
and learning-based approach for solving problem P2. 

III. Spectrum Access Games and Distributed 
Learning Algorithm 

Since no centralized controller is available in the considered 
distributed network and all the users take their spectrum access 
strategies distributively and autonomously, we formulate this 
problem as a non-cooperative game. In the following, we 
present the formulated game models, analyze its properties, 
and propose a distributed learning algorithm to converge to 
stable solutions in dynamic environment. 

A. Game formulation and analysis 

In this subsection, we first present a state-based spectrum 
access game, in which an inherent system state specifies the 
active user set. Based on the state-based game, we then present 
a robust game, in which the expectations over all possible 
system states are considered. Note that the state-based game 
corresponds to problem PI while the robust game corresponds 
to problem P2. 

1) State-based spectrum access game: Formally, the state- 
based spectrum access game model is denoted as T\ = 
W,B, {An}n£tf, {u n (B)} neJ g-\, where A f = is 

a set of players (users), B is active user set, A n is a set of 
available actions (channels) for user n, and u n (B) is the utility 
function of player n. The utility function is defined as the 
available transmission, i.e., 


function </>: A\ x • • • x An —>• R such that for all n G TV, all 
a n G A n , and a' n G A n , the following holds: 

u n {B, a n , ti—n) u n {B, a nl a— n ) > 0 , 

a n , Qj—n ) $(B, a n , a~ n ) 0 

That is, the change in the utility function caused by the 
unilateral action change of an arbitrary each user has the same 
trend with that in the ordinal potential function. Following the 
similar methodology presented in lfl6l . we have the following 
theorem. 

Theorem 1. For any active user set B, the state-based 
spectrum access game T\ is an ordinal potential game. 

Proof: For presentation, for any active user set B and an 
arbitrary user Vn G B, denote 

v n (J3,a n ,a- n ) = - ^2 PJfdfn, ( 10 ) 

i(zJ3\{n}:ai=a n 

which can be regarded as the weighted interference m 
experienced by user n. Define (j) : A± x • • • x A\&\ —»■ R 
as 

a n , a-n) — ^2/ 

= ~Y. E PiPnd£, (11) 

n€l3 i£l3\{n}:ai=an 

which is the aggregate weighted interference experienced by 
all the active users. 

If an arbitrary player n unilaterally changes its channel 
selection from a n to a*, then the change in v n caused by 
this unilateral change is as follows: 

ttn{B, Cl n , d—n) t) n {B, a n , a-n) 

E PiPndf n a - E PiPndr* (12) 

i€t3\{n}:ai=a n i^B\{n}:ai=a^ l 

For analysis convenience, denote the users choosing the 
same channel with user n as I n (a n ,B) = {i G B\n : ai = 
a n }. Then, the change in potential function 0 caused by the 
unilateral change of user n can be expressed as follows: 


(B, a n , a~ n ) — Rn(B, a n , a — n )? Vn G ff ,^B G r ( 6 ) 

Each user is the game intends to maximize its individual utility, 
which means that the state-based spectrum access game can 
be expressed as: 

(Ti) : max u n (£>, a n , a_„), Vn G B. (7) 

a„ 6 A„ 

In order to investigate the properties of T \, we first present 
the following definitions, which is directly drawn from |24 ). 
Definition 1 (Nash equilibrium ). For an arbitrary active user 
set B, a spectrum access profile a* = {a \,..., a* B |) is a pure 
strategy NE if and only if no player can improve its utility by 
deviating unilaterally, i.e.. 


f(B, a* n , a- n ) - f(B, a n , a_ n ) 

— ttn{B, a n , a— n ) v n (B, a n , a— n ) 

+ E P n Pid~r - E PnPid-? 

i£l n (a^,B) ieI n (a„,B) 



tt—n) 


ttn {B, a n ; 



(13) 


and 


Since u n and v n is related by: 

ttn(B, a n , a— n ) — B log (l -f- 
log( 


P d~ OL 
r n u n 


1 + 


follows that: 


P n d~ a 

— x/Pn~\-CT 


'j is i 


n )/Rn & 

(14) 

increasing with respect to x , it 


r U J n{R'i Q'ni ^—n) — ^n(^j Q j _ n )i V?7. G 13, Vd-a ^ -Am CLn 7^ CLn 

( 8 ) 

Definition 2 (Ordinal potential game ). A game is an ordinal 
potential game (OPG) if there exists an ordinal potential 


(u n (B, Oj n -i CL—n) 

X (v n (B, n) 


CL n i CL— n)^ 

— V n {Bi CL— n)^ 


^ O 5 VcZ-72, G An 

(15) 
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For any active user set B , combining ( IT3l ) and | |T5] > yields the 
following: 

^'Un(B, d n: d—n) U n (B, &m d — n)^ 

X f (£>(B : d nl d—ri ) d— n )j ^ 0, Vu n , u n £ 

(16) 

which satisfies the definition of OPG, as characterized by @. 
Thus, the state-based spectrum access game T\ is an ordinal 
potential game with (b serving as the potential function, which 
proves Theorem U] ■ 

2) Robust spectrum access game: As discussed above, it 
is not feasible to perform optimization for each active user 
set since the network is always changing. Thus, based on the 
state-based spectrum access game bF\, we formulate a robust 
spectrum game below. Specifically, the robust spectrum access 
game is denoted as I 2 = [TV, {A n } n eN, K} n£ jy], where 
TV = {1,..., N} is a set of players (users), A n is a set of 
available actions (channels) for user n, and uj n is the utility 
function of player n. The utility function in robust spectrum 
access game is defined as the expected available transmission 
rate over all possible active user sets J22 ), ||23l , 11251 . i.e., 

^n(dni d—ji) — E j^\u n {B ^ d n , d— n )] ^ ' P / ( K B)u n (.B , d n , d—n) 

Be r 

(17) 

Similarly, the robust spectrum access game can be expressed 
as: 

(J r 2 ) : max w„(a n , a_ n ),Vn G TV. (18) 

Theorem 2. The robust spectrum access game J ~2 is also an 
ordinal potential game. 

Proof: We define the potential function as: 

, d—n) —E.g [0(1^, d n: tt—n)] , (19) 

where </> is specified by CD. 

If an arbitrary player n unilaterally changes its channel 
selection from a n to a*, then the change in w n caused by 
this unilateral change is as follows: 

d—n) iC n (d n , d—n) (”201 

— Eg \u n (Bj d. n , d—n') U n (J3 , d n , Q—n)] 

Similarly, the challenge in the $ is determined by: 


3) Discussion on the game models: Ordinal potential game 
(OPG) admits the following two promising features [ 24l : (i) 
every OPG has at least one pure strategy Nash equilibrium, 
and (ii) an action profile that maximizes the ordinal potential 
function is also a pure strategy Nash equilibrium. Some further 
discussions on the two spectrum game models are listed below: 

• Both the state-based and robust spectrum access games 
have at least one pure strategy NE. 

• For the state-based spectrum access game, the aggregate 
weighted interference serves as the potential function, as 
specified by (ITTI) . For the robust spectrum access game, 
the expected aggregate weighted interference serve as the 
potential game, as specified by (IT9l) . Thus, it is known all 
the NEs of the games minimize the (expected) aggregate 
weighted interference respectively. Furthermore, it has 
been shown that minimizing lower weighted interference 
leads to higher throughput m-m. Thus, it can be 
expected that the two games would also achieve high 
throughput. 

B. Distributed learning for achieving Nash equilibria 

As the expected aggregate interference serves as the po¬ 
tential function for the robust spectrum access game, it is 
desirable to develop distributed algorithms to achieve the Nash 
equilibria. Conventional algorithms in the game community, 
e.g., best response dynamic l24l , fictitious play l26l and spatial 
adaptive play E3, can not be applied since they need informa¬ 
tion exchange among the players. To eliminate the requirement 
of information exchange, some distributed algorithms have 
been applied in wireless applications, e.g., MAX-logit l28l 
and Q-learning (29). However, B-logit, and MAX-logit are 
only suitable for static environment; although Q-learning can 
be applied in dynamic networks, its convergence in multiuser 
environment can not be guaranteed. 

In this paper, we propose a learning-based distributed spec¬ 
trum access algorithm, which is mainly based the stochastic 
learning automata la ED. To begin with, denote q n {k) = 
{hni{k) : ■ ■ •, Qn\A n \(k)} as the mixed strategy of player n 
in the fcth slot, in which q nm is the probability of choosing 
channel m. The key ideas of the proposed distributed learning 
algorithm are: i) the active users choose the channels according 
to their mixed strategies, and then update their mixed strategies 
based on the received payoffs, and iii) an inactive user does 
nothing. Specifically, the learning procedure is as follows: 


d—n) ^(tl njd— n ) (”211 

= E b [</>(/?,a*,a_ n ) - f(B,d n ,a- n )] 

Using the result obtained in (fl6l >. the following always holds: 


x (<f>(a*,a_ n ) - $(a n , 



U 0, Vrt n , d n (z An 


( 22 ) 


Thus, it is proved that the robust spectrum access game bF -2 is 
also an ordinal potential game with $(«„,, a_ n ) serving as the 
potential function. ■ 


Initialization: set k = 1 and set the initial mixed strategy of 
each user as q nm {k) = G TV", V?n G A n . 

Loop for k = 1,2, ... 

Denote B(k) as the active user set in the current slot. 

1) . Channel selection: according to its current mixed 
strategy q„(fc), any active user n in B(k) randomly chooses 
a channel a n (k ) from its available channel set A n in slot k. 

2) . Channel access and transmit: All the active users trans¬ 
mit on the chosen channels, and they receive the instantaneous 
transmission throughput R n (k ), which is determined by (0. 

3) . Update mixed strategy: All the active users n G B(k) 
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update their mixed strategies according to the following rules: 


qnm{k + 1 ) = qnm(k) + br n (k)( 1 - q nm (k)),m = a n (k) 

qnm{_k 4-1) — qnm(k') br n (,k^jq nrn (/c), m a n (k), 

(23) 

where b is the learning step size, and r n (k) is the normalized 
received payoff defined as follows: 


r n (k) 


R n (k) 

T>max 5 

1 


(24) 


where R" iax is the interference-free transmission throughput 
of user n , i.e., R™ ax = B log (l + Pn * n ). 

All the inactive users keep their mixed strategies unchanged, 
i.e., 

q n (/c 4-1) = q n (fc),Vn e Af\B(k) (25) 


End loop 



X (m) 


The rationale behind the update rule (l23l> is as follows: when 
a channel is selected and a positive payoff is received, the 
probability of choosing the channel in the next step increases 
while the probabilities of choosing other channels decrease 
accordingly. When all the players adhere to this rule, the sys¬ 
tem will finally converge to a stable state. Note that the above 
learning algorithm is fully distributed since an active user only 
needs its individual action-payoff information. Furthermore, 
its asymptotical convergence property is characterized by the 
following theorem. 

Theorem 3. When the learning step size goes sufficiently 
small, i.e., b —> 0 , the proposed distributed learning algorithm 
asymptotically converges to a pure strategy NE point of robust 
spectrum access game J~ 2 - 

Proof: It has been rigorously proved that the stochastic 
learning automata converges to pure strategy Nash equilibria of 
exact potential games in [32j. In methodology, the differences 
in this work are summarized as: i) all the users are always 
active in l32l . while they are randomly active or inactive in 
this work, and ii) for exact potential games, the change in the 
utility function cased by the unilateral action change of an 
arbitrary user is the same with that in the potential function, 
i.e., 

tl—n) U_ n ) — d_ n ) 0(tt n , (2_ n ) 

(26) 

When proving the convergence for exact potential games, 
the following inequality is vital (See equation (C.40) in |32l ): 

tl—n) U—n)) Ct—n) 0(tt n , U—n)) A 0 

(27) 

Note that the above inequality also holds for ordinal potential 
games (See equation (fl6l) '). Thus, following similar lines for 
the proof given in l32l (Theorem 5), and with some additional 
modifications for processing the user active probability X n 
||23l , this theorem can be proved. However, to avoid unneces¬ 
sary repetition, the detailed proof is omitted. ■ 

IV. Simulation Results and Discussion 
The simulation scenario follows the setting given in m. 
The cognitive APs are randomly located in a 500m x 500m 


Fig. 2. A network consisting of eight cognitive APs. By inquiring the 
geo-location spectrum database, each AP knows its available channel set 
and the transmitting transmission power, e.g, the available channel set and 
transmission power of AP 1 are {1,2} and 350 mW, respectively. 

square area. There are M = 5 channels with bandwidth 
B = 6MHz, the noise power is a = —100 dBm, and the 
path loss factor is a = 4. The distance between AP n and its 
associated boundary user is d n = 20 m. By inquiring the geo¬ 
location spectrum database, each AP operates with a specific 
transmission power P n and has a different set of available 
channels. The step size for the learning algorithm is b = 0.1. 

A. Convergence behavior 

We consider a specific network which is shown in Fig. [2] 
Due to the different locations, the action (available channel) 
sets of the users are different. For example, the action set 
of users 1, 2, 3 and 4 are A\ = {1,2}, Ai = {2,3,4}, 
A 3 = {1,3,4} and A \ = {3,4} respectively. For presentation, 
it is assumed that all the users have the same active probability 
A = 0.8. The convergence behavior of the proposed distributed 
spectrum access algorithm is shown in Fig. Q] Specifically, 
the channel selection probabilities of users 2, 3 and 4 are 
presented. It is noted that the channel selection probabilities 
of user 2 converges {0,0,1} over its available channel set 
{2,3,4}, which means that it finally chooses channel 4 for 
data transmission. Similarly, the selection probabilities of 
users 3 and 4 also finally converge. The results validate the 
convergence of the proposed distributed learning algorithm. 

B. Throughput performance 

For comparison, we consider four approaches: optimal, 
best NE, worst NE and the distributed learning algorithm. 
1) Optimal: problem P2 is solved directly in a centralized 
manner. Due to the fact it is a combinatorial optimization 
problem, we apply the exhaustive search approach to obtain 
the optimal solution. 2) Best NE and worst NE: by assuming 
that information exchange among the users is available, the 
best response algorithm f24l is applied to achieve pure strategy 
NE of the robust spectrum access game T 2 in a distributed 
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Fig. 3. The convergence of the proposed distributed learning algorithm. 



Fig. 4. The expected throughput versus the active probabilities of the users. 


manner. We carried out 500 independent trails and then take 
the best one and worst one respectively. Note that the best 
NE and worst NE serve as the upper and lower bounds of the 
game. 3) The proposed distributed learning: in the absence of 
information exchange and centralized controller, all the users 
adhere to the proposed distributed learning algorithm. 

First, we evaluate the throughput performance for the spe¬ 
cific network shown in Fig. [2] For presentation, all the users 
have the same active probability. The expected throughput 
when varying the user active probability is shown in Fig. 
0] The result of the proposed learning is by simulating 200 
independent trials and then taking the expected results. Some 
important results can be observed from the figure: i) the best 
NE is almost the same with the optimal solution, while the 
throughput gap between the worse NE and the optimal solution 
is trivial, which validate the effectiveness of the proposed 
robust spectrum game, ii) the proposed distributed learning is 
very close to the optimal one. Note that some similar numerical 
results on the throughput comparison were reported in j22l . 

Secondly, we also consider the specific network shown in 
Fig. [2] but the active probabilities of the users are different. 



Fig. 5. The throughput performance comparison for six scenarios with 
heterogeneous active probabilities. 


Specifically, the active probabilities of the users are set to 
{0.1,0.2,0.3,0.4,0.5,0.6, 0.7,0.8} (tagged as Scenario 1), 
{0.3,0.3,0.3,0.6,0.6,0.9, 0.9,0.9} (tagged as Scenario 2), 
{0.3,0.4,0.5,0.5,0.5,0.6, 0.7,0.8} (tagged as Scenario 3), 
{0.4,0.4,0.4,0.6,0.6,0.6, 0.6,0.6} (tagged as Scenario 4), 
{0.3,0.6,0.6,0.6,0.6,0.6, 0.6,0.7} (tagged as Scenario 5), 
{0.6,0.6,0.6,0.6,0.6,0.6, 0.6,0.7} (tagged as Scenario 6), 
respectively. The throughput comparison results are shown in 
Fig. 0 It also noted from the figure that for the scenarios with 
heterogeneous active probabilities, the best NE is almost the 
same with the optimal solution and the proposed distributed 
learning is close to the optimal one. These results validate 
the effectiveness of the formulated spectrum access game as 
well as the proposed distributed learning algorithm, in both 
homogeneous and heterogeneous scenarios. 

Thirdly, we evaluate the throughput performance for general 
networks. Specifically, the cognitive APs are randomly located 
in the square region. Each channel is independently vacant 
with probability 6 = 0.7 for each AP, and the transmis¬ 
sion power of each AP is randomly chosen from the set 
{100, 200, 250, 300, 350, 280,400}mW. The throughput com¬ 
parison when varying the number of cognitive APs is shown 
in Fig. |6] For each number of APs, e.g., N = 10, we simulate 
100 independent topologies and take the average result. When 
the network scales up, exhaustive search is not feasible due to 
the heavy computational complexity. However, it is believed 
that the best NE would be very close to the optimal one. It is 
shown from the figure that the proposed distributed learning 
is close to the best NE, which again validates the proposed 
game-theoretic solution. Also, as the network scales up, the 
normalized throughput decreases due to the increase in the 
mutual interference, as can be expected. 

To summarize, the simulation results show that the best NE 
is almost the same with optimal solution, and the proposed 
distributed learning algorithm is very close to the optimal one. 
Recalling the dynamic and incomplete information constraints 
in the considered system, we believe that the proposed game- 
theoretic learning solution is desirable for practical applica¬ 
tions. 
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Fig. 6. The throughput performance comparison for general networks (the 
active probabilities of all the users are A = 0.8). 


V. Conclusion 

This paper investigated the problem of database-assisted 
spectrum access in dynamic networks, in which the active user 
set is varying. Since there is no central controller, it encounters 
dynamic and incomplete information constraints. To solve 
this challenge, we formulated a state-based spectrum access 
game and a robust spectrum access game. We proved that the 
two games are ordinal potential games with the (expected) 
aggregate interference serving as the potential functions, and 
proposed a distributed learning algorithm without information 
exchange to achieve the pure strategy Nash equilibrium (NE) 
of the games. Simulation results show that the best NE is 
almost the same with the optimal solution and the achievable 
throughput of the proposed learning algorithm is very close 
to the optimal one, which validates the effectiveness of the 
proposed game-theoretic solution. Note that there are still 
some new challenges and open issues to be studied. For exam¬ 
ple, users can access more than one channel when equipped 
with the multiple radio technology. A game-theoretic carrier 
aggregation in unlicensed spectrum bands is ongoing and will 
be reported soon. 
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